Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

نویسندگان

Shane Walker

Morten Pedersen

Iroro Orife

Jason Flaks

چکیده

For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with sufficient accuracy in the unbounded space of conversational speech. These corpora are also timeworn due to dated acoustic telephony features and the rapid advancement of colloquial vocabulary and idiomatic speech over the last decades. Utilizing the colossal scale of our unlabeled telephony dataset, we propose a technique to construct a modern, high quality conversational speech training corpus on the order of hundreds of millions of utterances (or tens of thousands of hours) for both acoustic and language model training. We describe the data collection, selection and training, evaluating the results of our updated speech recognition system on a test corpus of 7K manually transcribed utterances. We show relative word error rate (WER) reductions of {35%, 19%} on {agent, caller} utterances over our seed model and 5% absolute WER improvements over IBM Watson STT on this conversational speech task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi - Supervised Learning for Acoustic

Enormous amounts of audio recordings of human speech are essential ingredients for building reliable statistical models for many speech applications, such as automatic speech recognizers and automatic prosody detector. However, most of these speech data are not being utilized because they lack transcriptions. The goal of this thesis is to use untranscribed (unlabeled) data to improve the perfor...

متن کامل

Discriminative optimization of large vocabulary Mandarin conversational speech recognition system

This paper examines techniques of discriminative optimization for acoustic model, including both HMM parameters and linear transforms, in the context of HUB5 Mandarin large vocabulary speech recognition task, with the aim to partly solve the problems brought by the sparseness and the highly ambiguous nature of the telephony conversational speech data. Three techniques are studied: MMI training ...

متن کامل

Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles

Arabic speech recognition suffers from the scarcity of properly labeled data. In this project, we introduce a pipeline that performs semi-supervised segmentation of audio then— after hand-labeling a small dataset—feeds labeled segments to a supervised learning framework to select, through many rounds of hyperparameter optimization, an ensemble of models to infer labels for a larger dataset; usi...

متن کامل

Semi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data

This work addresses one of the common issues arising when building a speech recognition system within a low-resourced scenario adapting the language model on unlabeled audio data. The proposed methodology makes use of such data by means of semisupervised learning. Whilst it has been proven that adding system-generated labeled data for acoustic modeling yields good results, the benefits of addin...

متن کامل

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1705.09724 شماره

صفحات -

تاریخ انتشار 2017

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

نویسندگان

چکیده

منابع مشابه

Semi - Supervised Learning for Acoustic

Discriminative optimization of large vocabulary Mandarin conversational speech recognition system

Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles

Semi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

عنوان ژورنال:

اشتراک گذاری